Add --run-in-docker to skill-validator to run Copilot CLI in a docker container by caaavik-msft · Pull Request #176 · dotnet/skills

caaavik-msft · 2026-03-03T23:54:34Z

Summary

This PR adds an optional Docker execution mode to skill-validator so agent runs, judges, and setup commands can execute in an isolated container instead of directly on the host machine.

Motivation

The main use case for this is for local development, but it might also be useful for running in CI if we want to build on top of it. I was building some skills and found that when using some weaker models, they made destructive changes to my host system to accomplish the task (e.g. reinstalling .NET). With this, agents and judges run inside a container with only access to the files they need bound to the host machine. This does not add any additional security measures for network isolation.

Implementation

This makes use of the --headless mode for running copilot as described here: https://github.com/github/copilot-sdk/blob/main/docs/guides/setup/backend-services.md.

It requires a GITHUB_TOKEN be present to pass into the container so that it can use that to authenticate to the Copilot API. I have an example in the README which explains that you can get this token with gh auth token. For people with multiple gh accounts (e.g. personal and enterprise), you can also do gh auth token --user <name>.

A Dockerfile is included in the repo to use as the base image:

FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build

ARG COPILOT_SDK_VERSION
RUN dotnet new console -o /tmp/dl \
    && dotnet add /tmp/dl package GitHub.Copilot.SDK --version $COPILOT_SDK_VERSION \
    && dotnet build /tmp/dl -c Release \
    && cp /tmp/dl/bin/Release/net10.0/runtimes/*/native/copilot /usr/local/bin/copilot \
    && chmod +x /usr/local/bin/copilot \
    && rm -rf /tmp/dl

RUN copilot --version

This ensures that we use the exact same Copilot CLI binary that is shipped with the SDK. The SDK version is resolved programmatically inside the SkillValidator so it is kept in sync. It places the copilot binary at /usr/local/bin/copilot inside the container.

To handle path mapping/translation, when running in docker mode, all temp/work directories are placed inside a single directory in the TMP folder, and that entire directory is mounted into the container with read-write. This makes it easy to map paths to and from the host and container equivalent when needed. Skill directories are also mounted into the container with read-only access, and only the directories that are being evaluated will be mounted.

The container uses a randomised port -p 0:4321 which is resolved later using docker port. The container is always cleaned up after finishing, including on ProcessExit and CancelKeyPress events.

Future Extensibility

I have a proof of concept working locally which I chose not to push for now to keep this PR simple which runs all agents inside their own containers rather than having a single container that is used to run all agents and judges. This would help reduce any risks of agents modifying the environment and impacting other evaluations if that sounds desirable, but it does mean that each agent would use a separate CopilotClient rather a single shared CopilotClient.

…t-server

Copilot

Pull request overview

Adds an optional --run-in-docker execution mode to eng/skill-validator so Copilot CLI runs headlessly inside a Docker container, with host/container path translation and read-only skill mounts to reduce host impact during evaluations.

Changes:

Introduce DockerCopilotServer to build/run a Copilot CLI container, map work/skill paths, and exec setup commands in-container.
Wire Docker-mode path mapping into agent sessions and judges; add --run-in-docker config/flag and documentation.
Add Dockerfile to the build output and new unit tests for mount/path mapping logic.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
eng/skill-validator/src/Services/DockerCopilotServer.cs	Implements Docker image build/run, mounts, path translation, and container lifecycle management.
eng/skill-validator/src/Services/AgentRunner.cs	Uses Docker CLI URL when enabled; maps paths; runs setup commands via `docker exec`; maps permission paths back to host.
eng/skill-validator/src/Services/Judge.cs	Maps working directory to container path for judge sessions in Docker mode.
eng/skill-validator/src/Services/PairwiseJudge.cs	Maps working directory to container path for pairwise judge sessions in Docker mode.
eng/skill-validator/src/Commands/ValidateCommand.cs	Adds `--run-in-docker`, initializes Docker server before runs, and stops it during normal cleanup.
eng/skill-validator/src/Models/Models.cs	Adds `RunInDocker` to `ValidatorConfig`.
eng/skill-validator/src/SkillValidator.csproj	Copies Dockerfile into output so it’s available at runtime.
eng/skill-validator/src/Docker/Dockerfile	Builds a base image and installs the Copilot CLI binary from the SDK package.
eng/skill-validator/README.md	Documents Docker mode requirements and usage.
eng/skill-validator/tests/DockerCopilotServerTests.cs	Adds unit tests for mount computation and host/container path mapping.

Comments suppressed due to low confidence (1)

eng/skill-validator/src/Commands/ValidateCommand.cs:169

In Docker mode, DockerCopilotServer.Initialize(...) happens before early-return paths in model validation (invalid model / exception). Since GetSharedClient() can start the Docker container during model validation, returning early here can leak a running container and temp dirs. Wrap the body of Run in a try/finally that always calls StopSharedClient(), dockerServer.StopAsync(), and CleanupWorkDirs() (and/or clear DockerCopilotServer.Instance) even on early exits.

        // Set up DockerCopilotServer with skill directories to mount
        if (config.RunInDocker)
            DockerCopilotServer.Initialize(config.Verbose, allSkills);

        // Validate model early
        try
        {
            var client = await AgentRunner.GetSharedClient(config.Verbose);
            var models = await client.ListModelsAsync();
            var modelIds = models.Select(m => m.Id).ToList();
            var modelsToValidate = new List<string> { config.Model };
            if (config.JudgeModel != config.Model) modelsToValidate.Add(config.JudgeModel);

            foreach (var m in modelsToValidate)
            {
                if (!modelIds.Contains(m))
                {
                    Console.Error.WriteLine($"Invalid model: \"{m}\"\nAvailable models: {string.Join(", ", modelIds)}");
                    return 1;
                }
            }

            Console.WriteLine($"Using model: {config.Model}" +
                (config.JudgeModel != config.Model ? $", judge: {config.JudgeModel}" : "") +
                $", judge-mode: {config.JudgeMode}");
        }
        catch (Exception error)
        {
            Console.Error.WriteLine($"Failed to validate model: {error}");
            return 1;
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

eng/skill-validator/src/Services/DockerCopilotServer.cs

…t-server

JanKrivanek · 2026-03-06T08:25:22Z

eng/skill-validator/src/Services/DockerCopilotServer.cs

+            if (!Path.IsPathRooted(relativePath) &&
+                !relativePath.StartsWith(".." + Path.DirectorySeparatorChar, StringComparison.Ordinal) &&
+                relativePath != "..")
+            {
+                return Path.Combine(containerMount, relativePath).Replace("\\", "/");
+            }


NIT: This repeats and should be extracted to a helper function.

Also the separators normalization part might be needed regardles of path being rooted

Made this change, although the non-rooted path returns an exception. If the relative path is rooted, it means that they are on different drives. This code just means that only paths that are mapped inside the container will return without exception.

JanKrivanek · 2026-03-06T08:28:37Z

eng/skill-validator/src/Services/DockerCopilotServer.cs

+            "run",
+            "--name", containerName,
+            "-p", $"0:{InternalPort}", // Map internal port to a random host port
+            "-e", "GITHUB_TOKEN",


This feels like anything within the container (e.g. misbehaving skill) can now use/flush the token.
cc: @ViktorHofer

I just checked to see how the Copilot SDK uses GitHubToken and it looks like it just passes it in via an environment variable anyways https://github.com/github/copilot-sdk/blob/4e1499dd23709022c720eaaa5457d00bf0cb3977/dotnet/src/Client.cs#L1003-L1006. So I think it's not going to be practical to hide that. I think Copilot CLI would be the only practical place to fix this by having it hide the env var from any child subprocesses it creates. Also with full shell access, a misbehaving skill could always just run gh auth token to get a new token.

There is a sample here actually showing a way to solve this properly, but it would make things too complicated: https://github.com/github/copilot-sdk/tree/main/test/scenarios/bundling/container-proxy

I think Copilot CLI would be the only practical place to fix this by having it hide the env var from any child subprocesses it creates.

Damn, we should definitely fix this in the Copilot SDK. Would you mind filing an issue?

Overall, as long as we don't use this in CI where we have less control over the PAT, this is fine.

JanKrivanek

Looks good to me if it's only for local usage, not to be used in PR review pipeline (so opt-in is good here - as contributors cannot change it).

Some of this might be alleviated with the support for msbench that we are working on

…t-server

ViktorHofer · 2026-03-06T15:22:45Z

I assume you now have maintainer permissions. Can you please submit this change again from a non forked branch? Annoying, I know but we don't honor skill-validator changes in forks.

caaavik-msft · 2026-03-06T19:30:42Z

Reopened in #273

caaavik-msft added 2 commits March 4, 2026 09:17

Add --run-in-docker to run Copilot CLI in docker container

c42609a

Merge remote-tracking branch 'origin/main' into caaavik/docker-copilo…

f757de2

…t-server

caaavik-msft marked this pull request as ready for review March 6, 2026 05:01

caaavik-msft requested a review from a team March 6, 2026 05:01

caaavik-msft requested review from JanKrivanek and ViktorHofer as code owners March 6, 2026 05:01

Copilot AI review requested due to automatic review settings March 6, 2026 05:01

Copilot started reviewing on behalf of caaavik-msft March 6, 2026 05:02 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

caaavik-msft mentioned this pull request Mar 6, 2026

Change 'skill-validator' to 'sv' in temp directory name to reduce bias during evaluation #253

Merged

caaavik-msft added 3 commits March 6, 2026 15:43

Address PR comments

c842f7b

Merge remote-tracking branch 'origin/main' into caaavik/docker-copilo…

f12f8cc

…t-server

Change temp dir prefix for docker container host dir

bba74a6

JanKrivanek reviewed Mar 6, 2026

View reviewed changes

JanKrivanek approved these changes Mar 6, 2026

View reviewed changes

caaavik-msft added 3 commits March 6, 2026 19:35

Extract path mapping logic to helper function

659a68f

Merge remote-tracking branch 'origin/main' into caaavik/docker-copilo…

d2480a0

…t-server

Re-add newline at end of README.md that disappeared in merge.

f547651

caaavik-msft mentioned this pull request Mar 6, 2026

Add --run-in-docker to skill-validator to run Copilot CLI in a docker container #273

Open

caaavik-msft closed this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --run-in-docker to skill-validator to run Copilot CLI in a docker container#176

Add --run-in-docker to skill-validator to run Copilot CLI in a docker container#176
caaavik-msft wants to merge 8 commits intodotnet:mainfrom
caaavik-msft:caaavik/docker-copilot-server

caaavik-msft commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JanKrivanek Mar 6, 2026

Uh oh!

caaavik-msft Mar 6, 2026

Uh oh!

JanKrivanek Mar 6, 2026

Uh oh!

caaavik-msft Mar 6, 2026

Uh oh!

caaavik-msft Mar 6, 2026

Uh oh!

ViktorHofer Mar 6, 2026

Uh oh!

JanKrivanek left a comment

Uh oh!

ViktorHofer commented Mar 6, 2026

Uh oh!

caaavik-msft commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

caaavik-msft commented Mar 3, 2026

Summary

Motivation

Implementation

Future Extensibility

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JanKrivanek Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

caaavik-msft Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

JanKrivanek Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

caaavik-msft Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

caaavik-msft Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

ViktorHofer Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

JanKrivanek left a comment

Choose a reason for hiding this comment

Uh oh!

ViktorHofer commented Mar 6, 2026

Uh oh!

caaavik-msft commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants